Word sense discrimination in information retrieval: A spectral clustering-based approach
نویسندگان
چکیده
Word sense ambiguity has been identified as a cause of poor precision in information retrieval (IR) systems. Word sense disambiguation and discrimination methods have been defined to help systems choose which documents should be retrieved in relation to an ambiguous query. However, the only approaches that show a genuine benefit for word sense discrimination or disambiguation in IR are generally supervised ones. In this paper we propose a new unsupervised method that uses word sense discrimination in IR. The method we develop is based on spectral clustering and reorders an initially retrieved document list by boosting documents that are semantically similar to the target query. For several TREC ad hoc collections we show that our method is useful in the case of queries which contain ambiguous terms. We are interested in improving the level of precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30) respectively. We show that precision can be improved by 8% above current state-of-the-art baselines. We also focus on poor performing queries.
منابع مشابه
بررسی نقش انواع بافتار همنویسهها در تعیین شباهت بین مدارک
Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, wit...
متن کاملTopical Clustering of MRD Senses Based on Information Retrieval Techniques
This paper describes a heuristic approach capable of automatically clustering senses in a machinereadable dictionary (MRD). Including these clusters in the MRD-based lexical database offers several positive benefits for word sense disambiguation (WSD). First, the clusters can be used as a coarser sense division, so unnecessarily fine sense distinction can be avoided. The clustered entries in th...
متن کاملName Discrimination and Email Clustering using Unsupervised Clustering and Labeling of Similar Contexts
In this paper, we apply an unsupervised word sense discrimination technique based on clustering similar contexts (Purandare and Pedersen, 2004) to the problems of name discrimination and email clustering. Names of people, places, and organizations are not always unique. This can create a problem when we refer to or seek out information about such entities. When this occurs in written text, we s...
متن کاملIntegrating Sense Discrimination in a Semantic Information Retrieval System
This paper proposes an Information Retrieval (IR) system that integrates sense discrimination and disambiguation to overcome the problem of word ambiguity.
متن کاملOntology Based Query Expansion Using Word Sense Disambiguation
The existing information retrieval techniques do not consider the context of the keywords present in the user’s queries. Therefore, the search engines sometimes do not provide sufficient information to the users. New methods based on the semantics of user keywords must be developed to search in the vast web space without incurring loss of information. The semantic based information retrieval te...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Process. Manage.
دوره 51 شماره
صفحات -
تاریخ انتشار 2015